This Plot looks only at the most frequent words use over the entire corpus of tweets
This Plot looks only at the most frequent words use over the entire corpus of tweets and grouped by Twitter handle
The words are grouped by their adjacent words forming two word groups and then these groups gets counted.
Through the use of the “ldatuning” package, it realizes 4 metrics: “Griffiths2004”, “CaoJuan2009”, “Arun2010”, “Deveaud2014” to select the perfect number of topics for a LDA model. The total number of CPU cores can be indicated for optimal performance when executing this method. The larger the dataset, the longer it takes to calculate the results. For more information on this method and the various metrics to obtain the optimal K topics, visit: https://cran.r-project.org/web/packages/ldatuning/vignettes/topics.html or https://eight2late.wordpress.com/2015/09/29/a-gentle-introduction-to-topic-modeling-using-r/
Rajkumar Arun, V. Suresh, C. E. Veni Madhavan, and M. N. Narasimha Murthy. 2010. On finding the natural number of topics with latent dirichlet allocation: Some observations. In Advances in knowledge discovery and data mining, Mohammed J. Zaki, Jeffrey Xu Yu, Balaraman Ravindran and Vikram Pudi (eds.). Springer Berlin Heidelberg, 391–402. http://doi.org/10.1007/978-3-642-13657-3_43
Cao Juan, Xia Tian, Li Jintao, Zhang Yongdong, and Tang Sheng. 2009. A density-based method for adaptive lda model selection. Neurocomputing — 16th European Symposium on Artificial Neural Networks 2008 72, 7–9: 1775–1781. http://doi.org/10.1016/j.neucom.2008.06.011
Romain Deveaud, Éric SanJuan, and Patrice Bellot. 2014. Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique 17, 1: 61–84. http://doi.org/10.3166/dn.17.1.61-84
Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences 101, suppl 1: 5228–5235. http://doi.org/10.1073/pnas.0307752101
Martin Ponweiser. 2012. Latent dirichlet allocation in r. Retrieved from http://epub.wu.ac.at/id/eprint/3558
This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 8 based on the method used for identifying the optimal K value. This model was used with a “beta” matrix in order to examine per-topic-per-word probabilities.
This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.
This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.
This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities..
This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 6. SABC News had the most overall Tweets and a topic model with the only to have 6 topics as the other media houses yielded mixed results. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.
This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.